A Dive into the Global Housing Market from 2015-2024¶
- Student: Lewis King
- ID: 000051992
#This code imports the Libraries needed to open the data frame and visualize it in graphs.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from matplotlib.animation import FuncAnimation
#This will allow pandas to read and open the dataset
df = pd.read_csv("global_housing_market_extended (2).csv")
df
| Country | Year | House Price Index | Rent Index | Affordability Ratio | Mortgage Rate (%) | Inflation Rate (%) | GDP Growth (%) | Population Growth (%) | Urbanization Rate (%) | Construction Index | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | USA | 2015 | 117.454012 | 116.550001 | 9.587945 | 4.493292 | 1.514121 | -0.752044 | -0.796707 | 85.985284 | 118.089201 |
| 1 | USA | 2016 | 150.807258 | 51.440915 | 11.729189 | 5.662213 | 1.880204 | -0.545400 | -0.358084 | 69.127267 | 111.980515 |
| 2 | USA | 2017 | 123.194502 | 70.386040 | 8.506676 | 2.197469 | 2.398940 | 0.930895 | 0.596245 | 83.555279 | 85.973903 |
| 3 | USA | 2018 | 131.423444 | 91.469020 | 3.418054 | 4.537724 | 1.608407 | -1.479587 | 2.321099 | 88.968961 | 134.671788 |
| 4 | USA | 2019 | 110.461377 | 56.837048 | 9.158097 | 3.700762 | 1.293249 | 1.961415 | -0.879640 | 87.279612 | 90.702399 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 195 | UAE | 2020 | 160.153711 | 117.170365 | 7.390686 | 2.048681 | 4.061737 | 1.635019 | 1.955250 | 62.942477 | 109.059291 |
| 196 | UAE | 2021 | 95.004866 | 72.727315 | 9.636214 | 3.880091 | 2.943274 | 1.155810 | 0.608064 | 83.550496 | 141.366775 |
| 197 | UAE | 2022 | 175.533469 | 105.083237 | 5.838662 | 4.940674 | 3.344420 | 0.037365 | 1.943051 | 61.152790 | 142.140959 |
| 198 | UAE | 2023 | 126.147746 | 94.604103 | 8.934185 | 5.975589 | 4.638353 | 2.911469 | -0.766718 | 75.552241 | 82.013520 |
| 199 | UAE | 2024 | 153.743377 | 85.855534 | 9.122050 | 1.708365 | 1.051148 | 3.730587 | -0.747705 | 62.137702 | 70.968678 |
200 rows × 11 columns
#This function shows the shape of the dataset
df.shape
(200, 11)
The shape shows that the dataset has 200 rows and 11 columns.
#This function is going to display the first 5 rows of the dataset.
df.head()
| Country | Year | House Price Index | Rent Index | Affordability Ratio | Mortgage Rate (%) | Inflation Rate (%) | GDP Growth (%) | Population Growth (%) | Urbanization Rate (%) | Construction Index | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | USA | 2015 | 117.454012 | 116.550001 | 9.587945 | 4.493292 | 1.514121 | -0.752044 | -0.796707 | 85.985284 | 118.089201 |
| 1 | USA | 2016 | 150.807258 | 51.440915 | 11.729189 | 5.662213 | 1.880204 | -0.545400 | -0.358084 | 69.127267 | 111.980515 |
| 2 | USA | 2017 | 123.194502 | 70.386040 | 8.506676 | 2.197469 | 2.398940 | 0.930895 | 0.596245 | 83.555279 | 85.973903 |
| 3 | USA | 2018 | 131.423444 | 91.469020 | 3.418054 | 4.537724 | 1.608407 | -1.479587 | 2.321099 | 88.968961 | 134.671788 |
| 4 | USA | 2019 | 110.461377 | 56.837048 | 9.158097 | 3.700762 | 1.293249 | 1.961415 | -0.879640 | 87.279612 | 90.702399 |
# This function is going to output the general information of the Dataset
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 200 entries, 0 to 199 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Country 200 non-null object 1 Year 200 non-null int64 2 House Price Index 200 non-null float64 3 Rent Index 200 non-null float64 4 Affordability Ratio 200 non-null float64 5 Mortgage Rate (%) 200 non-null float64 6 Inflation Rate (%) 200 non-null float64 7 GDP Growth (%) 200 non-null float64 8 Population Growth (%) 200 non-null float64 9 Urbanization Rate (%) 200 non-null float64 10 Construction Index 200 non-null float64 dtypes: float64(9), int64(1), object(1) memory usage: 17.3+ KB
The output above shows the general information of the data showing that there are 11 different columns, null value count, and the data type. This data set includes numerical data and categorical data but mostly numerical.
# This code shows if there are null values in any of the columns and finds the sum of the null values per column
df.isnull().sum()
Country 0 Year 0 House Price Index 0 Rent Index 0 Affordability Ratio 0 Mortgage Rate (%) 0 Inflation Rate (%) 0 GDP Growth (%) 0 Population Growth (%) 0 Urbanization Rate (%) 0 Construction Index 0 dtype: int64
The output above shows that there are no missing values to deal with in this dataset. However, if there were missing values I would first copy the dataset into a new file then using the new dataset I would replace the null values with the statement "Unknown" for a categorical column, and for numerical columns, I would find the average of the column and replace the missing values with the average.
# This will check for duplicates and find the sum of the number of duplicates
duplicates= df.duplicated().sum()
print("Number of duplicates:", duplicates)
Number of duplicates: 0
# This will display a list of all the numerical columns and provide statical data on them
df.describe()
| Year | House Price Index | Rent Index | Affordability Ratio | Mortgage Rate (%) | Inflation Rate (%) | GDP Growth (%) | Population Growth (%) | Urbanization Rate (%) | Construction Index | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 | 200.000000 |
| mean | 2019.500000 | 130.381022 | 83.048370 | 7.237768 | 4.150621 | 3.649756 | 2.133722 | 0.722806 | 74.769891 | 111.204257 |
| std | 2.879489 | 28.752229 | 21.439858 | 2.576085 | 1.380222 | 1.881938 | 2.413270 | 1.035691 | 8.734246 | 24.309473 |
| min | 2015.000000 | 80.552212 | 50.354311 | 3.041688 | 1.537814 | 0.532110 | -1.921833 | -0.961391 | 60.172760 | 70.968678 |
| 25% | 2017.000000 | 104.142562 | 60.466671 | 5.034207 | 3.045278 | 1.939199 | -0.095628 | -0.183320 | 66.923842 | 90.182198 |
| 50% | 2019.500000 | 129.193653 | 83.721711 | 7.375697 | 4.329643 | 3.664625 | 2.307548 | 0.722372 | 75.100060 | 110.593807 |
| 75% | 2022.000000 | 157.127098 | 100.604665 | 9.276196 | 5.217706 | 5.261720 | 4.272776 | 1.621317 | 82.677430 | 133.780445 |
| max | 2024.000000 | 179.971767 | 119.855388 | 11.879671 | 6.485623 | 6.912349 | 5.958931 | 2.497948 | 89.788944 | 149.735748 |
df_new=df.copy()
# This will remove' (%)' from the column names in the DataFrame to make it easier to code with.
df_new.columns = df_new.columns.str.replace(' (%)', '')
df_new
| Country | Year | House Price Index | Rent Index | Affordability Ratio | Mortgage Rate | Inflation Rate | GDP Growth | Population Growth | Urbanization Rate | Construction Index | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | USA | 2015 | 117.454012 | 116.550001 | 9.587945 | 4.493292 | 1.514121 | -0.752044 | -0.796707 | 85.985284 | 118.089201 |
| 1 | USA | 2016 | 150.807258 | 51.440915 | 11.729189 | 5.662213 | 1.880204 | -0.545400 | -0.358084 | 69.127267 | 111.980515 |
| 2 | USA | 2017 | 123.194502 | 70.386040 | 8.506676 | 2.197469 | 2.398940 | 0.930895 | 0.596245 | 83.555279 | 85.973903 |
| 3 | USA | 2018 | 131.423444 | 91.469020 | 3.418054 | 4.537724 | 1.608407 | -1.479587 | 2.321099 | 88.968961 | 134.671788 |
| 4 | USA | 2019 | 110.461377 | 56.837048 | 9.158097 | 3.700762 | 1.293249 | 1.961415 | -0.879640 | 87.279612 | 90.702399 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 195 | UAE | 2020 | 160.153711 | 117.170365 | 7.390686 | 2.048681 | 4.061737 | 1.635019 | 1.955250 | 62.942477 | 109.059291 |
| 196 | UAE | 2021 | 95.004866 | 72.727315 | 9.636214 | 3.880091 | 2.943274 | 1.155810 | 0.608064 | 83.550496 | 141.366775 |
| 197 | UAE | 2022 | 175.533469 | 105.083237 | 5.838662 | 4.940674 | 3.344420 | 0.037365 | 1.943051 | 61.152790 | 142.140959 |
| 198 | UAE | 2023 | 126.147746 | 94.604103 | 8.934185 | 5.975589 | 4.638353 | 2.911469 | -0.766718 | 75.552241 | 82.013520 |
| 199 | UAE | 2024 | 153.743377 | 85.855534 | 9.122050 | 1.708365 | 1.051148 | 3.730587 | -0.747705 | 62.137702 | 70.968678 |
200 rows × 11 columns
# This will filter the dataset to include only rows where Mortgage Rate is greater than 0. Select the Mortgage Rate column for those rows, then display it.
mort = df_new[df_new['Mortgage Rate'] > 0][['Mortgage Rate']]
mort
| Mortgage Rate | |
|---|---|
| 0 | 4.493292 |
| 1 | 5.662213 |
| 2 | 2.197469 |
| 3 | 4.537724 |
| 4 | 3.700762 |
| ... | ... |
| 195 | 2.048681 |
| 196 | 3.880091 |
| 197 | 4.940674 |
| 198 | 5.975589 |
| 199 | 1.708365 |
200 rows × 1 columns
Because this outcome shows all 200 rows we no the are no impossible numbers in the Mortgage Rate.
# Prints the number of unique countries in the Country column
print("Unique countries:", df_new['Country'].nunique())
# Prints the unique years present in the Year column
print("Year range:", df_new['Year'].unique())
Unique countries: 20 Year range: [2015 2016 2017 2018 2019 2020 2021 2022 2023 2024]
# Defines a list of important columns to focus on
important_columns = ['House Price Index', 'Rent Index', 'Mortgage Rate', 'Affordability Ratio']
# Displays descriptive statistics for the important columns
df_new[important_columns].describe()
| House Price Index | Rent Index | Mortgage Rate | Affordability Ratio | |
|---|---|---|---|---|
| count | 200.000000 | 200.000000 | 200.000000 | 200.000000 |
| mean | 130.381022 | 83.048370 | 4.150621 | 7.237768 |
| std | 28.752229 | 21.439858 | 1.380222 | 2.576085 |
| min | 80.552212 | 50.354311 | 1.537814 | 3.041688 |
| 25% | 104.142562 | 60.466671 | 3.045278 | 5.034207 |
| 50% | 129.193653 | 83.721711 | 4.329643 | 7.375697 |
| 75% | 157.127098 | 100.604665 | 5.217706 | 9.276196 |
| max | 179.971767 | 119.855388 | 6.485623 | 11.879671 |
# Group the DataFrame by Year and calculate the mean House Price Index for each year
price_over_time = df_new.groupby('Year')['House Price Index'].mean()
# Plot the average House Price Index over time with markers at each data point
price_over_time.plot(marker='o')
# Set the title of the plot
plt.title('Global Average House Price Index Over Time')
# Label the x-axis and y-axis, and add a grid to the plot
plt.xlabel('Year')
plt.ylabel('House Price Index')
plt.grid(True)
# Display the plot
plt.show()
This line plot shows the Global Average House Price Index Over Time. Some observations to be noticed are that from 2015 to 2024, the global average house price index has generally increased, indicating rising housing costs worldwide. Then, during the COVID-19 pandemic years 2020–2021, instead of falling as would be expected during an economic recession, the average house price index rose sharply. And after the pandemic, the house price index still continued to rise.
# Groups each selected year from the year column into one value.
covid_years = df_new[df_new['Year'].isin([2019, 2020, 2021, 2022])]
# Group by country and year and calculate the mean of the house price index
trend = covid_years.groupby(['Country', 'Year'])['House Price Index'].mean().reset_index()
plt.figure(figsize=(15, 6))
# Plots bar plots for each year
sns.barplot(x='Year', y='House Price Index', hue='Country', data=trend, palette='Set1', edgecolor='black', linewidth=1.5)
# Adds labels and a title shows the plot
plt.title(' House Price Index by Country during Covid-19 (2019-2022)')
plt.xlabel('Year')
plt.ylabel('Average House Price Index')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')# Show the plot
plt.show()
After noticing a sharp increase in the house price index during the years of the pandemic, I decided to look into it more thoroughly, looking at each country in the dataset. This Bar plot confirms that there was a sharp increase in the average House Price Index after 2020 across all countries due to the COVID-19 pandemic and the global economic recession that followed due to COVID-19. Another thing to notice is that most countries reached their peak housing prices in 2021, likely due to very low mortgage rates and lifestyle changes during the pandemic because of remote work.
# Creates a list of key housing factors to analyze
key_vaules = ['House Price Index', 'Rent Index', 'Affordability Ratio', 'Mortgage Rate', 'Inflation Rate', 'GDP Growth', 'Population Growth', 'Urbanization Rate', 'Construction Index']
# Makes the correlation matrix for the selected key housing factors
matrix = df_new[key_vaules].corr()
plt.figure(figsize=(12, 8))
# Plot the heatmap of the correlation matrix with annotations and a color map
sns.heatmap(matrix, annot=True, cmap='coolwarm', linewidths=0.5)
# Set the title of the heatmap then displays it
plt.title('Correlation Matrix of Key Housing Factors')
plt.xticks(rotation=45)
plt.show()
The correlation matrix of key housing factors shows a positive correlation between the House Price Index and Rent Index, indicating that as house prices rise, rent prices tend to follow. Negative correlations between GDP Growth and Inflation Rate suggest that higher inflation may be associated with lower economic growth. Overall, the matrix provides valuable insights into the interconnectedness of housing market factors, aiding in informed decision-making and strategy development.
# Groups the dataset by Country and calculates the average House Price Index for each country, then selects the top 10 countries
ave = df_new.groupby('Country')['House Price Index'].mean().sort_values(ascending=False).head(10)
# Plot a bar plot for the top 10 countries by average house price index
sns.barplot(x=ave.values, y=ave.index, hue=ave.index, palette='dark', legend=True)
plt.title('Top 10 Countries by Average House Price Index')
plt.xlabel('Average House Price Index')
plt.ylabel('Country')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.show()
The bar plot showing the Top 10 Countries by Average House Price Index highlights that nations like Switzerland, Brazil, and the Netherlands consistently have the highest housing prices from 2015 to 2024. Which makes sense because these countries have a strong urban demand and limited housing supply. Overall, the results show that the highest house prices are concentrated in highly urbanized, economically strong countries with limited housing availability.
# Create a histogram of the Mortgage Rate column with 35 bins and a line to show density
sns.histplot(df_new['Mortgage Rate'], bins=35, kde=True, color='skyblue')
plt.title('Mortgage Rate Distribution')
plt.xlabel('Mortgage Rate')
plt.ylabel('Frequency')
plt.show()
The distribution of mortgage rates shows that most countries experienced moderately low mortgage rates, with the majority of rates clustered between 2% and 6%. There is a peak around 4.5% suggesting this was the most common mortgage rate during the 2015–2024 period. Overall, the distribution supports the idea that low mortgage rates played a major role in causing housing prices to increase globally during 2015-2024.
# Creates a scatter plot to show the relationship between the House Price Index and the Rent Index. While the points are colored by Country, with black edges around the points.
sns.scatterplot(x='House Price Index', y='Rent Index', data=df_new, hue='Country',edgecolor='black', alpha=1)
plt.title('House Price Index vs Rent Index')
plt.xlabel('House Price Index')
plt.ylabel('Rent Index')
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()
The graph above shows a scatter plot comparing the House Price Index and Rent Index, showing a positive relationship between the two variables. What can be seen is that as house prices increase, rental prices also tend to rise. However, the data points are still pretty spread out, suggesting that while the two variables are related, the strength of the relationship varies by country.
# Creates an interactive scatter plot using fig.write_image("plot.png") to show the relationship between the House Price Index and the Rent Index, the points are colored by Country.
fig = px.scatter(df_new, x='House Price Index', y='Rent Index', color='Country', title='House Price Index vs Rent Index')
# Display the interactive plot
fig.show()
fig.write_html("house_price_vs_rent.html")
This scatter plot displays the same ideas above However, I chose to show it again using a different method to make it easier to dive deeper into the points on the graph.
# Groups the dataset by Year and calculates the average Affordability Ratio for each year
aff_over_t = df_new.groupby('Year')['Affordability Ratio'].mean()
# Plots the average Affordability Ratio over time, using red color and circular markers at each data point
aff_over_t.plot(marker='o', color='red')
plt.title('Affordability Index Over Time')
plt.xlabel('Year')
plt.ylabel('Average Affordability Index')
plt.grid(True)
plt.show()
This line plot shows the Affordability Ratio over time and reveals trends and anomalies in affordability. One thing noticed is that from 2015 to 2017 houses were more affordable, but from 2018 to 2020 shows an all-time high, which further suggests that the Covid-19 years had an extreme effect on housing prices, causing them to increase and affect the market globally. However, from 2022 to 2024, we can see the rates start to stabilize again and show that the housing market is becoming predictable again.
# This groups the dataset by Country and calculates the average House Price Index for each country
geo_map = df_new.groupby('Country').agg({'House Price Index':'mean'}).reset_index()
# Creates an interactive choropleth map using Plotly Express to visualize the average House Price Index by country
fig = px.choropleth(geo_map, locations='Country', locationmode='country names', color='House Price Index', color_continuous_scale='plasma', title='Average House Price Index by Country')
fig.show()
# Save as an interactive HTML file
fig.write_html("average_house_price_by_country.html")
The choropleth map of the average house price index by country shows countries like the USA, Australia, and the UK have expected high house price indexes, displaying more expensive housing markets. However, countries such as Mexico and South Africa have lower indexes, suggesting more affordable housing. One surprising thing to notice about North America is the very stark difference between the USA, Canada, and Mexico. The USA shows the highest price index and I expect Canada to follow, but out of the whole of North America, they have the lower price index.
Insights:¶
- Affordability has generally declined over time across all countries around the world.
- Certain countries face critical affordability issues, especially urbanized and highly developed areas like the USA.
- Rising housing costs strongly correlate with mortgage rate fluctuations and inflation and each one of these variables affects the other.
- A limitations might include the exclusion of certain countries.
# Filter the DataFrame to include only rows for the countries 'USA', 'Canada', and 'Mexico'. Then creates an interactive line plot to visualize the trend of House Price Index over time for these countries
fig = px.line(df_new[df_new['Country'].isin(['USA', 'Canada', 'Mexico'])], x='Year', y='House Price Index', color='Country', title='Trend of House Price Index over Time for the USA, Canada, and UK')
fig.show()
# Save as an interactive HTML file
fig.write_html("house_price_trend_usa_canada_mexico.html")
The line plot of the House Price Index trends for the USA, Canada, and Mexico from 2015 to 2024 reveals overall rising housing prices in all three countries in North America. The USA shows a steady increase with peaks around 2020 and 2024, showing a strong increase in house prices. Canada displays more sudden changes, with fluctuations and a peak around 2019 followed by a decline and a rise. Mexico experiences sharp increases around 2018 and 2023, suggesting periods of fast increase.
df_new.to_csv('global_housing_market_extended (2)_cleaned.csv')